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' We study the typical case properties of the l-in-3 satisfiability problem, the boolean satisfac- 

tion problem where a clause is satisfied by exactly one literal, in an enlarged random ensemble 
parametrized by average connectivity and probability of negation of a variable in a clause. Random 
l-in-3 Satisfiability and Exact 3-Cover are special cases of this ensemble. We interpolate between 
these cases from a region where satisfiability can be typically decided for all connectivities in poly- 
nomial time to a region where deciding satisfiability is hard, in some interval of connectivities. 
We derive several rigorous results in the first region, and develop the one-step-replica-symmetry- 
breaking cavity analysis in the second one. We discuss the prediction for the transition between the 
almost surely satisfiable and the almost surely unsatisfiable phase, and other structural properties 
' of the phase diagram, in light of cavity method results. 

^ PACS numbers: 89.20.Ff, 75.10.Nr, 05.70.Fh, 02.70.-r 

! I. INTRODUCTION AND MOTIVATION 
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Classification of the average-case computational complexity of the constraint satisfaction problems is 
a major task in theoretical computer science. Many problems were successfully analyzed by rigorous 
probabilistic methods. However, the average-case complexity remains an open question for most of the 
well known NP-complete problems [l|, Q , for example the ^-satisfiability, Vertex Coloring, and also 1- 
(— I , in-K satisfiability, on the commonly studied random ensembles (sparse random regular or Erdos-Renyi 
Q graphs). n n n, 

O \ In recent years heuristic methods from statistical physics [3, 3, Q have allowed us to understand some 
average-case properties of large random instances ^] . The aim of these studies was not to prove average 
I NP-completeness 01, it was rather to understand why the problems appear hard for some local algorithms, 
^ in some intervals of ensemble parametrization. These efforts culminated in designing a new polynomial 
algorithm, survey propagation 9,], which empirically outspeeds all the previously known heuristics. 
Rigorous undertanding of this algorithm is, however, still missing. 
' The fact that lies behind this success is the intrinsic similarity of the combinatorial optimization prob- 
[ lems to physical systems called spin glasses • The organization of solutions is analogous to the structure 
• of the free energy landspace of the physical models. Several phases can be located in the parameter space, 
' with abrupt transitions between the different phases. An example is the SAT/UNSAT transition, i.e. the 
"Y"; ' transition from the SATisfiable (SAT) phase where almost every instance is satisfiable (ground state of 
■ energy zero) to the UNSATisfiable (UNSAT) phase where almost every instance is unsatisfiable (positive 
ground-state energy). Another is the glassy transition where the phase space splits into many clusters 
and metastable states, and where many dynamical procedures (a physical dynamics or an algorithm) 
' ^ , are unable to find the ground state. This connection between the structure of solution and the aver- 
age algorithmic performance was the main motivation for detailed studies of the phase diagram of the 
ii'-satisfiability [1, H, H, [HI, Vertex Coloring and many other problems. 

The presented study of the phase diagram of the l-in-K satisfiability (sometimes called exact satisfia- 
bility [14,]) problem adds one more item to this list. But prolonging a list is not the main motivation for 
this work. 1-in-A' SAT on the symmetric ensemble (probability that a variable in a clause is negated is 
'h ' 1/2 ) is one of the few NP-complete problem which has been proven to be on average polynomial (Easy) 
9^ ', 0- On the other hand for the positive ensemble (no negations, equivalent to Exact Cover) no such proof 
exists, nor is there a heuristic algorithm with empirically polynomial time performance in the vinicity of 
the SAT/UNSAT transition. However, by analogy with K-SAT and coloring, we may expect polynomial 
time performance in this region using survey propagation. 

Our main motivation is to interpolate between the symmetric and positive ensemble to show how the 
phase space changes. For this reason we introduce a e-l-in-AT SAT problem, and study the phase diagram 
in parameters (7,e), where e stays for the probability that a variable in a clause is negated, and 7 is the 
average connectivity of a variable. To our knowledge, this general ensemble is considered here for the first 
time. We generalize the rigorous probabilistic analysis to the general e case. Then we use the replica- 
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symmetric and one-step-replica-symmetry-breaking cavity methods [3, |j| to understand more features of 
the problem in the whole space of parameters. 

Our motivation is similar to the one which led to the introduction of the (2+p)-SAT problem [l6l. ITtI. fisjl . 
where the instances are a mixture of 2-SAT and 3-SAT clauses on Erdos-Renyi graphs. A parameter p 
interpolates between the ensemble of random 3-SAT formulas, which are know to be computationally hard 
in a region including the SAT/UNSAT transition, and random 2-SAT formulas, for which an anycase 
polynomial algorithm exists. A statistical physics approach has been applied to study the (2 -t-p)-SAT 
problem, however only the replica-symmetric solution was investigated. Analogical interpolation between 
P and NP-complete cases of some other problems have been investigated in [l9|, [l^l • 



A. The model 



A factor graph G — (Km K; E) is a bipartite graph, where the two species of vertices are called respec- 
tively variables i G Vy and clauses a G Vc- It is a common graphical object used in computer science in 
order to encode the geometrical framework of a problem in combinatorial optimization, as it often allows 
us to shorten and clarify the "rules of the game" . 

This is the case also for l-in-if-SAT. Indeed, cases of both A'-SAT and l-in-A'-SAT are example of 
boolean satisfiability problems, and thus formally inscribed in a framework of boolean logic expressions: 
we deal with M boolean clauses over N variables, which should be simultaneously satisfied (i.e. evalued to 
True), in order to consider the A'^-tuple of assignments to the variables a solution of the problem instance. 
Each clause a involves K out of the 2A^ literals xi, . . . ,xn,xi, . . . ,xn (not Xi and Xi simultaneously). 
While a AT-SAT clause is satisfied if at least one of the involved literals is True, a l-in-A'-SAT clause is 
satisfied if exactly one of the involved literals is True. 

For both problems, a factor graph G (whose clause-nodes a have degree AT) and a function J : E{G) 
±1 fully encode an instance: if clause a involves literal Xi or Xi we will have an edge (i,a) G E{G), and 
Jai — +1 or —1 respectively. It is customary to draw edges with J = -1-1 as solid lines, and edges with 
J = — 1 as dashed lines. 

If we use the common identification with "spin variables" Si 

Xi = True < — > Si = +1 
Xi — False < — > Si — —1 

the function E^j^y{s) corresponding to a l-in-AT-SAT clause is 

[True s,,Ja^,+■■■ + s,^JaiJ, =2-K 
Efj i(s) ~ < (1) 
I False otherwise 

and we say that s is a solution of the given instance if /\^ E^j^y{s) — True. 

l-in-AT-SAT is polynomial in the case AT = 2 (coinciding with 2-XOR-SAT, or 2-Coloring), while it is 
NP-complete for AT > 3, even in the restriction to all J's positive (unlike AT-SAT). 

For what concerns average-case complexity, two Erdos-Renyi-like random ensembles of instances are 
commonly considered. In both cases we have N variables and every possible clause is present with 
probability p such that the average number of clauses is M = N'y/K, and variables have Poissonian 
degree with average 7. Then we distinguish 

Positive Poisson ensemble: The edge parameters Jai are all +1. In this case we use a shorthand for 
the energy function Aj^^(^ _|_,^) = Ea- 

Symmetric Poisson ensemble: The edge parameters Jai are random independent in {±1} with equal 
probability. 

The positive version of the problem is the one which corresponds to Exact Cover, in the case of incidence 
matrices whose columns have AT nonzero entries. 

In this paper we study the generalization to the ensemble in which the J's are taking value ±1 inde- 
pendently, e G [0, 1/2] being the probability of having J = —1. We call this generalization e-l-in-AT SAT, 
in order not to confuse with the l-in-AT SAT by which is often meant only the symmetric ensemble. We 
describe the phase diagram of this problem in the parameters (e,7). 
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B. Main results and the paper organization 



Throughout this paper we present methods and resuhs relevant to the problem of e-l-in-X-SAT with 
K = 3 only. Generalisation to instances of larger clause length requires, in most cases, only small changes 
in methodology. 

In section |lT] and appendices [XllBl we derive algorithmic and probabilistic bounds, both rigorous, for 
the SAT/UNSAT threshold in the e-l-in-3-SAT problem. The most remarkable result of those sections is 
that the bound is tight for e G [0.272M/2], so in that interval the SAT/UNSAT threshold is rigorously 
known. This generalizes the result of [1^| for the symmetric ensemble e — 1/2. 

In section IIIII we develop the Replica-Symmetric (RS) solution. First we write the replica-symmetric 
equations IIII A( then we discuss the zero-temperature limit IIIIBI We analyze the hard-fields solution in 
IIII C| and the soft-fields solution in Imp] However, as we show in IIII E| this solution cannot be correct 
(cease to be stable) above a certain connectivity not larger than the expected SAT/UNSAT transition. 
At this connectivity the belief propagation algorithm would fail to converge. In fact, there even exists 
a region in the phase diagram, where the RS solution is not stable, and yet the Short Clause Heuristics 
(SCH) algorithm is proven to work in on average polynomial time. To our knowledge this does not happen 
in any of the previously studied models, and is a point worth further investigation. 

In section ITVl we work out the one-step-Replica-Symmetry-Breaking (IRSB) solution. In this case we 
assume the existence of many disconnected clusters of solutions, and many metastable states, which can 
actually trap most of the traditional algorithms. We write the general equations in section IIV A[ then 
we concentrate on the zero-temperature zero-energy case IIVBI which leads to the survey propagation 
equations. The zero-temperature positive-energy case is studied in appendix [Cl In appendix [Dl we check 
the local stability of the IRSB solution. 

The main resuh of the IRSB analysis is the prediction for the SAT/UNSAT threshold, fig. [TJ For 
e < 0.07 the IRSB approach is stable around the SAT/UNSAT line, so the threshold is likely to be exact. 
Whereas for 0.07 < e < 0.2726 the IRSB result is unstable, and a more involved analysis would be required 
to locate exactly the SAT/UNSAT threshold (the IRSB result is expected to be an upper bound). The 
presence of a nontrivial IRSB solution in the small-e region suggests the presence of a Hard-SAT region. 
Details of these results are discussed in section IIV CI 



FIG. 1: The phase diagram of e-l-in-3 SAT problem, for what concerns the SAT/UNSAT transition. The pa- 
rameters e and 7 describe the probability of negations and the average variable connectivity. For e > 0.2726, 
the threshold is rigorously 7*(e) = l/(4e(l — e)) (drawn as a solid line), since the unit-clause upper bound and 
short-clause-heuristic lower bound coincide in that region. For e < 0.2726, the dot-dashed, dashed and dotted line 
denore respectively the SCH lower bound, and the UC and first-moment-method (IMM) upper bounds. The solid 
line is our one-step replica-symmetry-breaking (IRSB) prediction for the SAT/UNSAT threshold. For < e < 0.07 
the IRSB result is stable (gray shading) and so the threshold is likely to be exact. For 0.07 < e < 0.2726 the 
IRSB result is unstable, and so the threshold is just approximate (expected to be an upper bound). 
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II. RIGOROUS BOUNDS ON THE SAT/UNSAT THRESHOLD 



A. Unit-Clause Propagation Analysis 



Unit-Clause (UC) algorithms are a class of randomized algorithms for boolean satisfiability problems, 
which when applied to a specific instance, seek a solution or a certificate of unsatisfiability by assign- 
ing variables to ±f (or "True/False") whilst maximizing the amount of logical deductions coming from 
uniquely determined constraints {unit clauses). In the absense of immediate deductions, variables are 
fixed by some heuristic rule, and these free steps determine a branching process on the space of feasible 
configurations. Our analysis of unit-clause propagation is elucidated in appendix \^ while for a more 



Algorithms based on unit-clause propagation have been already analysed for the problems of sym- 
metric and positive l-in-K SAT. For the positive ensemble (e — 0) the best known lower bound to the 
SAT/UNSAT transition is 7 = 1.638 [2^ . and no upper bound is known from unit-clause algorithms. 
For the symmetric ensemble (e = 1/2), the method allows us to determine that the exact SAT/UNSAT 
transition is 7 = 1 [l^. Here we extend these results to compute the upper bound 7uc(e) a-nd the lower 
bound 7sch(e) for general probability of negation, which describe regions that are almost surely (a.s.) 
Easy SAT or Easy UNSAT for an instance of the e-l-in-3-SAT, sampled from the random (e, 7) ensemble 
(cfr. section \LK^ . 

In appendix lA II we demonstrate the upper bound 7uc(e) to the connectivity above which an Easy 
UNSAT phase exists. Whenever 7 > 7uc(e), the instance is a.s. proven to be UNSAT by a randomized 
linear-time decimation algorithm in which one tests, for all variables i, if both fixing = 4-1 or = — 1 
lead to contradictions through unit-clause implications alone. This line has the analytic form 7uc(e) — 



In appendix IA 2l we obtain the lower bound 7sch(e) to the connectivity below which an Easy SAT phase 
exists. Now, we perform an extensive number of free choices, and thus we should specify our heuristic 
rule. It turns out that, among those tested, the Short Clause Heuristics (SCH; assigning a variable in 
one of the shortest clauses remaining) is the one attaing the best bound on the whole interval of e. If 
7 < 7sch(e), by fixing variables according to SCH one can find a solution with finite probability on any 
run. Restarting the procedure many times allows us to find a solution in on average linear time. This 
extends the idea employed for Exact Cover in [2^ . At all e we find lower bounds by numerical integration 
(see above figure), including 7sch(0) = 1.639, consistent with the analysis of (22| . 

Finally in appendix lA 31 we prove analytically that on the interval e S [0.2726,1/2] the curves 7uc(e) 
and 7sch(e) coincide. The result includes the symmetric ensemble, for which it was originally proven in 
[isj . This fact indicates that there exists a region of the phase diagram in which typical instances of the 
(e, 7) ensemble are easily solved, except at the exactly determined SAT/UNSAT transition line. 



For e — * we have 7uc ^ and so we would like to find a better upper bound by some different 
method. An improvement is obtained through the First Moment Method (IMM) on the 2-core of the 
graph. Restriction to the 2-core makes the bound tighter, as it reduces instance-to-instance fluctuations. 
This provides a line 71mm (e), which is finite everywhere, and thus beats 7uc in some interval of small e 
(details are in appendix IB 1[) . The best known upper bound for the positive ensemble (e = 0) is 7 = 1.932, 
obtained by a refinement of the first moment method [23|. 

Still, the first moment method is only probabilistic, and does not allow us to find a certificate in polyno- 
mial time for a given instance. Such a task is achieved at finite 7, also in the region of small e and e = 0, 
through the embedding of l-in-3-SAT into an instance of 3-XOR-SAT. While £'^^"""^(51,52, S3) = True on 
the three configurations (si, 52,53) = (+,—,—), (—:+,—) and (—,—,+), the function i?^"-^°^(si, S2, S3) 
also allows for (si,S2,S3) — (+,+,+). So all the constraints are linear relations, and the problem is 
formally solved by Gaussian elimination. This gives an upper bound for an "Easy-UNSAT" phase, in- 
dependently of e at 7 = 3a* = 2.754 where a* is the SAT/UNSAT threshold (clause-to-variable ratio) 
in 3-XOR-SAT [2^ (cfr. appendix IB 21 for details) . Note for comparison that in random 3-SAT at 
given finite a (however large) in the UNSAT phase, there is no polynomial algorithm which can find a.s. a 
certificate for a typical instance, and intuition strongly suggests that such an algorithm can not exist [26j . 




1/(46(1-6)). 



B. Upper bounds for small e 
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III. RS CAVITY APPROACH 



The cavity method is developed within a statistical mechanics formulation. For this purpose we choose 
an integer-valued "energy function" for a single clause, E_^j^-j{s), to be associated to the original boolean- 
valued function E^°°y{s) in ([1]); just as £^{j^}(s) = or 1 respectively if E^°°y{s) = True or False. We 
thus have a Hamiltonian 

n{s) = Y,2E[j^^{s), (2) 

a 

which counts (twice) the number of contradictions. The factor 2 is a useful convention, so that all the cavity 
parameters to be introduced below (cavity fields and biases) will be integer in the "zero-temperature" 
limit. 

The introduction of the cost function above allows us to define a Gibbs weight e~^'^'^^\ where (3 is some 
parameter {inverse temperature), so that a single contradiction causes a dump of a factor e~^^ in the 
measure of the configuration. As customary in statistical mechanics, one introduces a partition function 

Z{f3)=Y.e-^^^^^ (3) 

s 

and a set of observables (say, probabilities of having patterns A) 

prob(A) := J2 e"'^^(^) . (4) 

s: A happens 

Within this framework the cavity method translates certain obvious recurrence relations for interaction 
structures on a factorized graph (tree) to approximate self-consistent equations for local expectation values 
on a graph which is only "locally tree-like" (e.g. a sparse Erdos-Renyi graph at large N, where loops are 
expected to arise at lengths of order In N) . 

The replica-symmetric (RS) assumption is used at a certain point. It consists in assuming that there 
is a single pure state describing the equilibrium behaviour of the ensemble. In turns, it will allow to 
neglect certain connected correlation functions. In this section we develop the cavity method under this 
hypothesis, while extensions are discussed in later sections. 

We will not review in detail all the derivations of the equations, instead we just introduce, in section 
nil A| some notations on the "easy" case of interaction on a tree, and give without proof the further 
formulas which are valid in the various contexts. A heuristic consideration of the complications arising 
on a random graph with long loops can be found in 0, 01 • 



A. RS cavity equations 

Consider a problem defined on a factor graph G, such that, for a certain edge {i, a), G is composed of 
factorized components attached to the vertices in a neighbourhood of radius 1 of («, a). Call {di \ a) the 
set of other clauses, besides a, neighbouring i, and {da\i) the set of other variables, besides i, neighboring 
a. This description motivates a factor graph of the form on fig. [2] left, where "gray bubbles" stands for 
some other parts of the graph, and there are no paths connecting distinct bubbles, except through the 
explicitly drawn neighborhood of («, a). 

For a variable j £ {da \ i), call Z^^"' the quantity Z ■ prob(sj = s) on the system consisting of the (gray- 
bubble) subgraph attached to vertex j, see fig. [2] right, Z being the partition function of this subsystem. 
Similarly for a clause & G (9i \ a) , call F/^* the quantity Z ■ prob(si = s) on the system consisting on the 
subgraph attached to node i through edge {b, i), see fig. [2] right. Then we have the composition relations 
(say da\i = {j, k} for our clause of degree 3) 

^:r= n ^'r^ (sa) 

Sj ,Sfc 




/ 

fixed to s 




Y" 




fixed to Si 



FIG. 2: Left: Factor grapli representing tiie l-in-3 SAT problem, in a neiglibouriiood of an edge {i,a). Rigiit: 
Definition of tlie cavity partition functions. 



At this stage we note that, in rewriting the cavity partition sums in terms of probabiHties, the belief 
propagation equations [13, famihar to computer scientists, are attained. 

As usual in physics we reparametrize the pairs Z^,Z- as different natural quantities, a free energy 
F — 1/(3 ln(Z+ + Z-) and a magnetic field h = 1/(2/3) ln(Z_|./Z_), the field that, if applied to the 
variable in substitution of the whole system, would cause the same average magnetization. We define the 
cavity fields and cavity biases in the following way 



The recursion equations for h and u then foUows from ([5]) 

■ 

2/3 exp [^(/ij^aSj + /lfc_aSfe - 2£;{j^}(-l, Sj, Sfe))] 



(6) 



(7a) 



(7b) 



One can think of u's and /I's as messages being attached to the edges of the graph, and oriented (the u's 
towards the variable, the /I's towards the clause). Then, the update functions ([7]) are represented on the 
variable- and clause-nodes, as in the figure below. 

7 ^ ^^^"^ 



Similarly, we handle the free energies. First define the accessory quantities 

Z' = Yl Y+^' + Yl Y-^' , 

a^di aGdi 

/ J 

Si,Sj,Sk 



(8a) 
(8b) 



Then we write the free-energy shift AF"*-^^" after adding a clause a and connecting all its neighbors i e da, 
and free-energy shift AF' after connecting all the components incident on variable i 



-/3AF° 



-0AF' 



6XP {/3[/ij^aSj + ft-j^aSj + hk^aSk - 2E [j^y{Si, Sj , Sk)]} 
U^e^a nbGazxa2cOSh(/3M6^,) 

Z^ _ 2cosh(/3Eaea^««-0 



(9a) 



(9b) 
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Finally, for the free energy, F = l/f3\nZ{G), of a tree graph one gets 

F(/3) =^AF'^u^'^-^(d, -1)AF% (10) 

a i 

where di is degree of variable i. Writing this in terms of fields we note the cancellation of the factors 
2 cosh(/3Ma_,i) between the denominators of ((9a|) and l|9bp. Furthermore, in the numerator of (j9bp . the 
combination hi := X]aeOi^°^' appears. This is the "total field" parameter for the magnetization of 
variable i in cavity approximation. 

The free energy as a function of inverse temperature /3 on a given graph allows us to determine, 
by Legendre transform, the number exp (5(i?)) of configurations of given energy E (number of violated 
clauses). Both S and E are extensive, i.e. of order TV, and corrections decreasing with 1/N are understood. 

EW = ^^^§P^; S{E) = iE-F)P{E). (11) 

The main insight here is that we can think of the set of cavity fields as parameterization for the local 
"magnetizations" of variables i (i.e. probability of being Si — +1, for two-state variables), in a system in 
which the interaction of i with a neighboring clause a has been modified {cavity system). If the clause 
a has been "switched off" , the nodes in the neighborhood of a now become well-separated on the graph 
which effectively describes the cost function. An assumption of decorrelation of variables, which is exactly 
true for variables in disconnected components, and approximatively valid for variables sufficiently far 
away on the graph, out of a critical temperature and within a pure thermodynamic phase, provides us 
self-consistent equations for the cavity fields. The equations are exactly the same as those we wrote for 
factorized graphs and hold in the leading order in the system size N, in particular eq. pU|) . see The 
cavity assumption can be self-consistently checked as we will describe in section UlI El 

B. The zero-temperature limit: Hard and soft fields 

In the limit of zero temperature, (3 — > oo, the update of cavity biases ([7]) simplifies significantly to 

h^^a = ^ Uh^j , (12a) 

Ua->i = - [max (/ij^qSj + hk^aSk ~ 2E^j^y{+l,Sj,Sk)) 

(12b) 

- max {hj^aSj + hk-,aSk - 2Es^j^}{-l, Sj, Sk)) ■ 

It is immediately seen that, as £'{j^j.(s) is evaluated over {0, 1}, it is self-consistent to assume that /i e Z 
and u e {-1, 0, -|-1}. 

In fact the only characteristic property of E^j^y{s) we need to have is that i?{j^}(s) = if and only 
if s satisfies clause a, and that E[j^y{si, S2, S3) — E^j^y{—si, S2, S3) £ { — 1,0,-1-1}, (a kind of discrete 
"Lipschitz" condition) , which clearly holds for our choice of Hamiltonian ^ . 

The only other choice of Hamiltonian for l-in-3-SAT sharing this property is 

n'is) = 2Y,E[j^^{s), (13) 

a 

where E'^{s) coincides with Ea{s) except that on (+, +, +), where it is valued E' — 2 instead of 1 (because 
2 flips are required in order to satisfy the clause). 

The fact that h,u e Z is much more than self-consistent, it is necessary, even for the "true" cavity 
fields (the ones that we would find from the evaluation of global partition functions, instead of the ones 
being solution of the cavity equations), and approximatively true in a whole region of large /3 (it suffices 
that p ^ InA^). Let us concentrate first on hi^a and say that Z^"^" = X^n where the integer 

coefficients g+(n) count the configurations with n violated clauses, in the proper cavity system labeled 
by (« — > a). There will be a certain value n+ corresponding to the first non- vanishing coefficient g+(n). 
Identical definitions are assumed for -|- ^ — . Then we have 
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So, at all orders in a purely algebraic expansion in powers of we only have two terms: a first one, 
(n_ — the hard field, is constrained to be integer; and secondly the coefficient in the second term, the 
soft field, which being the logarithm of the ratio of two (potentially large in TV) integers, is simply taken 
as a value over R. 

In particular the ground-state energy, /3 — > oo limit of eq. (|lip . can be computed using only the hard 
fields. On the other hand, to compute the ground-state entropy the soft fields are necessary (the general 
relation follows from (jlip. but is rather lengthy). 

We will see in the next section that working only with the hard fields has huge computational advantages, 
however, as they do not contain all the information, we come back to the soft fields in section UlI Dl 



C. The hard-fields analysis: Warning Propagation 

In this section we return to equations ([72]) . and neglect for this moment the 1/(3 part of the field. The 
corresponding equations are called warning propagation, and the discrete set of possible values for the 
biases takes an interpretation in terms of "kinds of warnings" : 



Ua^t ^ 
Ua^i = +1 



Clause a tells to variable i: "I think you should be —1" 

Clause a tells to variable i: "I can deal with any value you take" 

Clause a tells to variable i: "I think you should be -|-1" 



The analogous interpretation for fields h is 



Variable i tells to clause a: "I would prefer to be —1" 

Variable i tells to clause a: "I don't have any strong preferences" 

Variable i tells to clause a: "I would prefer to be +1" 



h.,^a < 

hi^a = 
h.,^a > 

From which the prescriptions (|I2[) on how to update the "warnings" over the graph also become intuitive. 

We now determine statistics over the ensemble of random pairs (G, {Ja})- It turns out that, although 
fields h can take infinite values, by virtue of the Poisson ensemble the equations are closed under a finite 
number of parameters: We define probabilities p_/p+/po that cavity fields h are negative/positive/zero, 
and similar probabilities q-/q+/qo that biases u arc — 1/+1/0. Then, the statistical average of equations 
(fT2|) . seen as defining a dynamics of time evolution for the distributions of the fields, gives the following 
dynamical map over q = (g^ , q_ ) (the auxiliary vector p — (p_|_ ,p-) is also defined) 

q = Mq' ; q' = {pi , 2p+ - 2pl) ; (15a) 

P = Mp'; P' = if{iq+,iq-),f{iq-,iq+)); (15b) 

where we recall that 7 is the mean degree of variable, while matrix M describes the probability of negations 

Finally the function f{r, s) gives the probability that the difference of two Poissonian-distributed integers 
(resp. with rate r and s) is positive 

00 00 ^„ 
/(r, s):=^ ^ PoisSr(n)PoisSs(TO) ; PoisSa{n) := e^" — -. (17) 

m— n—ni-\-l 

The "paramagnetic" state q = is everywhere a solution of (fT5|) . It is however numerically unstable above 
the line 7uc = l/(4e(l — e)) (coinciding with the unit-clause upper bound). A non-paramagnetic q 7^ 
solution appears continuously above this line and is stable. Conversely, for e < e*, the non-paramagnctic 
solution appears discontinuously at connectivity 7^3, and is a stable local attractor. 

The line 7^g(e), and even the "triple point" e* at which 7^g(e) touches 7uc, depend on the choice 
of Hamiltonian H ^ or H' (fO)l . and are plotted in fig. O Since the ground-state energy E{(3 — > 00) 
(fTTjl is zero if and only if q = we conclude that the line 7^3 for e < e*, and 7uc for e > e* is the 
rcplica-symmctric prediction for the SAT/UNSAT threshold. 
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FIG. 3: Results of the replica-symmetric cavity analysis and its stability. The continuous curves with no bullets are 
the rigorous bounds left for comparison. The line with diamonds corresponds to the replica-symmetric prediction 
for the SAT/UNSAT threshold derived from the Hamiltonian Ti, with Ea{+, +, +) ~ 1, while the line with triangles 
to the one derived from the Hamiltonian 7i', with E'^{+,+,+) = 2. The dotted line is the soft-fields instability 
of the replica-symmetric solution, it is separating the stable region (below) from the unstable one (above) . For 
e > 0.33 ± 0.02 the stability line seems to coincide with the SAT/UNSAT threshold. 



0.2429 
0.2277 0.2726 




Easy UNSAT 

RS sat/unsat w. Ti. 
RS sat/unsat w. Ti.' 
RS instability Ti. 



There are striking hints towards the badness of the replica-symmetric solution. For example, the 
predicted critical value 7r^s(^ ~ ^) different from the numerical one, and even larger than the rigorous 
upper bound. But there is also an inconsistency internal to the method. It is not possible that the 
satisfiability line in the phase diagram depends on the finite-temperature Hamiltonian used in the cavity 
equations, but the lines 7^g coming from Ti. and Ti.' are different. Another argument comes from the 
cavity prediction of the ground-state energy i?niin = E{P oo) (|lip which is zero if and only if q = 0. 
The discontinuous appearance of a new fixed point leads to a discontinuity in 7 of £"111111(7, e), but this is 
impossible, as we have the Lipschitz condition 



d_ 



£min(7,e) e [0,i] 



(18) 



coming from the fact that adding randomly AI' clauses to an instance whose minimum energy is i^min 
can only give an instance with minimum energy in the range [£min, £inin + M']. In SCCtiou |TTIE] we will 
explain in detail why and where exactly the replica-symmetric solution breaks down. 



D. The soft-fields analysis 



In the region of the phase diagram where the RS hard-fields analysis predicts zero ground-state energy 
(SAT region) all the hard fields and biases are zero ((7+(0), (0) > in the language of equation (fH)) ). 
We denote with capital letters Ua->i and Hi^a the soft fields, i.e. 



0- 







Their update is deduced analyzing the general cavity equations ([7]) 

Ja^Ua^^ = - ^ In (e^'^-^-" -|- e^^-^- 



(19) 

(20) 
(21) 
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Defining q{U) and p{H) as the probabihty distributions of U and H over the graph, we have the self- 
consistent equations 

piH)^il-e)piH) + epi-H); (22) 

OO ^ „ A; A; 

piH) = E ^"'ir / n [dt^' 6[H-Y,U?j; (23) 

being the analogue of (jlSbp . while for the (|15ap one has 

q{U) = {l-e)q{U)+eqi-U) (24) 
g(C/) = J AH.AHu p{H,) p{Hk) 5(u + ^ In {e^"^ + e^"")') (25) 

These equations are already beyond the possibilities of an analytical treatment, and can be solved only 
by a population-dynamics technique Q. 

In this "paramagnetic" region the expression for the RS ground-state entropy (logarithm of the number 
of SAT configurations) simplifies. Thus, knowing the distributions q{w) and p{g) we can compute the 
average entropy. In appendix IB II we compute the average number of solutions (A/") , annealed entropy 
ln((A/')), whereas the RS computation leads to quenched average of the entropy (In TV). The same quantity 
may be computed for a given large sparse graph from the message passing procedure (|2ip . However, the 
result will be valid only in the region where the RS assumption is valid, see section [III El 



E. The RS stability 



The replica-symmetric solution will turn out to be incorrect if the assumption of having a single pure 
phase is proven to fail. As we know, a necessary condition for this is that fields incoming to a given node 
are uncorrelated. This property can be tested on the RS solution: if the spin-glass susceptibility diverges, 
then all the fields (and in particular the pairs of incoming ones) are strongly co-fluctuating, and the RS 
assumption is inconsistent. The (nonlinear) spin-glass susceptibility is defined as [2^. [soj 



Xsg(G) 



- E 



J/c I 



XsG = E(27)'E((sos.)^). 



*,i6V(G) 



d=0 



(26) 



On the left is the definition for a fixed graph G, {siSj)c is the connected correlation function between 
nodes i and j. On the right we consider the average over instances, in the thermodynamic limit, where 
sites So and Sd are at distance d. The factor (27)*^ stands for the average number of neighbours at distance 
d, when d <S^\tiN. Assuming that the limit for large d of the summands in (|26p exists (with the limit 
iV — > 00 performed first), we relate it to the stability parameter: 



A= hm i2j)(Ei{soSd)l) 

a— >oo \ 



(27) 



Then the series in ((26)) is essentially geometric, and converges if and only if A < 1. 

Using the fluctuation-dissipation theorem we relate the correlation {saSd)c to the variation of magne- 
tization in So, caused by an infinitesimal magnetic field in s^. Then, one relates this quantity to cavity 
fields, i.e., up to a factor C independent from d. 



E 

a^dd 
be 90 



E 



duh- 



(28) 



Finally, using the fact that we perform the large- limit first, the variation above is dominated by the 
direct influence through the length-d path between the two nodes, and this induces a "chain" relation: if 
the path involves the clause and variable nodes (a^, d, a^-i, d — 1, . . . , ao, 0) we have 



E((soSrf)2) = C 



n 

U=l 



dUae 



(29) 



11 



{b G dj \ a} □ 




{c e dk\ a} {uc^k} 



Inside the paramagnetic phase and in the zero-temperature limit (but keeping soft fields of section IIlID[) . 
from (|21[) . we have for a path like the upper one in the above figure 



Ja 



' du, 



Jo 



dUa- 



J. jd'^'J ai -ff 7 — ^ a 



dU, 



+ e 



(30) 



Instead of directly computing the stability parameter A, it is equivalent, and numerically easier, to associate 
every bias Ua— >i in the population-dynamics algorithm with a positive number Va^i- We update this 
number together with the fields according to 



E 



dub- 



Vb- 



E 



dUr- 



Vc^k ■ 



(31) 



(we recall that, when performing population-dynamics technique, the labels "a — ^ z" do not have any 
spatial meaning: the population is just a collection, and messages are randomly combined at each step). 
After equilibration, the numbers v will change on average geometrically, with a factor A. 

Numerically, we see that for a given e the stability parameter grows with connectivity 7. On fig. [3] we 
see the line above which the replica-symmetric solution is unstable (i.e. A > 1). This line coincides with 
the unit-clause upper bound within the errors from about e > 0.33 ± 0.02. In particular all the region 
where the RS results are contradictory is unstable. It is furthermore remarkable (and unexpected) that 
there exist a region in which RS solution is not stable, and yet the short clause heuristics a.s. finds a 
solution in polynomial time. 



IV. IRSB CAVITY APPROACH AND ITS IMPLICATIONS FOR THE PHASE DIAGRAM 

The understanding of the role of ergodicity in the validity of the replica-symmetric cavity assumption 
allows us to recast the cavity method as a more powerful tool, for the case in which there are (exponentially) 
many phases. The assumptions underlying this process go under the jargon term "IRSB" type of symmetry 
breaking [3|, y|. In the IRSB approach we assume that exponentially many pure thermodynamical states 
(phases) exists, and that the neighbors of a node in absence of this node are uncorrelated only within 
each of these states. This happens because the cluster property (the small correlation of observables far 
from each other) holds only within a pure phase. The name replica symmetry breaking is due to historical 
reasons, since the mechanism was first proposed by Parisi [Slj, while using the "replica trick" in analysing 
a spin-glass model. 

The necessary but conceptually impossible handling of the multiplicity of pure phases may be replaced 
by a "survey" over these phases. The only memory of the original structure is through the free energies Fa 
of the various phases a. As the phases have to be weighted with a Boltzmann weight in i^^, a reweighting 
term has to be introduced in the "survey" equations, as we show in section flVAl In the zero-temperature 
limit the analysis leads to what are now called survey propagation equations, which are developed fully in 
section HVBl 

The solution to these equations is described in section lTV C 1[ in section lTV C 2l are results of the stability 
analysis of this solution. Checking the validity of the IRSB cavity assumption (IRSB stability analysis) 
gives us a hint if this solution could be the final correct one. This has been done for the i^-SAT jTl|, [s^l and 
Coloring problems [l^ on random graphs. The results in those cases supported strongly the conclusion 
that the SAT/UNSAT thresholds computed with the IRSB cavity method were exact. The IRSB stability 
analysis is technically involved, we summarize the main steps in appendix [Dl 
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A. General IRSB cavity equations 

We define a complexity function as the logarithm of number of states, and hence it is computable 

by Legendre transform 

-/3TO$(m, 13) = -f3mF{P) + E(F) ; ^^rP- = ' (^2) 

uF 

where the parameter m plays role of a second temperature, for free energies of states instead of energies of 
configurations, and is called the Parisi parameter of the replica symmetry breaking. The function $(m, /3) 
is called the "replicated free energy" . 

Instead of one field and one bias on every edge, now we need to keep one field and one bias for every 
edge and every state. Or equivalently, as we assumed that there is a huge number of states, a distribution 
of fields and biases on every edge. The self-consistent equation for this distribution is [1, |j| 

'' bedj~^a bedk^a (33) 

The function is the single-phase update of biases given by equations ([7|), and the last term is the 
reweighting of states, where Ai^°~** is the free-energy shift after adding clause a and all its neighbors 
except i. Referring to our calculations in equation (|9ap . this free-energy shift is 



e 



(34) 



n, GOa-.* n&eaj -.a (2 cosh ) 

Then, in analogy with pU|) . the replicated free energy $ is calculated as 

$(to, /3) = ^ ^^auda „ ^(^^ _ ^ (35) 

a i 

where di is the degree of the node i. The replicated free-energy shifts are 



The integrals are making an average over the distributions of the fields incoming to the cavity, similarly 
as in equation i.e. / /(mi, . ..,Uk) = J duiP'-^^ui) ■■ ■ J duk'P'^^\uk)f{ui, . . .,Uk). 

Since we are interested mainly in the ground-state properties of the e-l-in-3 SAT problem, we need to 
take the zero-temperature limit. There are two standard ways of doing this 

The energetic T ^ limit [4]: We take the limit /3 ^ oo, m ^ Q with y = j3m fixed and finite. Then 

-y$(y) = -y£; + S(S). (37) 

Here we neglected the entropic contribution and we can obtain complexity as a function of energy. 
This is the IRSB analog of the RS analysis with hard fields (warnings). 



The entropic T ^ Umit [33[: We take the limit /? ^ oo at energy fixed to zero, E ~ 0. Then 

m$(TO) =mS + E(S') ; (38) 

where —f3^{P, m) — *■ <i>(TO). Here we are fixed to zero energy, but on the other hand we can compute 
complexity (number of states) as a function of the state internal entropy. This is the IRSB analog 
of the RS analysis with soft fields (beliefs) . 
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In this paper we work out only the simpler energetic limit, the same as in ^ for ivT-SAT, or in [isj for 
Coloring. We will see how this analysis alone already gives us a large amount of information about the 
phase diagram. 

The reweighting (j34p becomes in the zero-temperature energetic limit 

A£"^^' = - max [hj^aSj + hk^aSk ~ E^j^y{si,Sj,Sk)) + ^ \ub^j\ + ^ |ub^fc| . (39) 

Since the fields h and biases u are integers, by relations (fT^ . Ai?""^* also takes nonnegative integer values. 
In fact it counts the number of contradictions in one warning-propagation update. 

We want to determine whether a typical e-l-in-3 SAT instance has any satisfying configuration. We do 
this in section HV B\ by taking the y ^ oo limit. The reweighting term exp (— j/Ai?°^*) then guarantees 
that we keep only the cases without contradictions, AE°'^^ = 0. Conversely, in order to compute the 
ground-state energy in the UNSAT region, or the complexity at energies higher than zero, we need to 
keep y finite. We undertake this in appendix [Cl 

B. The zero-energy case, survey propagation 

In the limit y — > oo we fix the energy to be zero. The energy shift AE""^^ is zero if and only if: 

• The {ub—,j}bedj-^a are all nonnegative or all nonpositive, 

• The {ui,^k}bedk^a are all nonnegative or all nonpositive, 

• The terms Jaj J2bedj-^a ""^^i ^^'^ -^ak J2bedk~^a ^"^^ both positive. 

We begin by simplifying the form of eq. p3p . In the zero-temperature energetic limit we can write the 
distribution of fields and biases over states on every edge as 

V'^'iua^,) = qT-'6{ua^, + 1) + qr'S{ua^, - 1) + q^^'S{ua^,) ; (40) 

r^"(/l.^a) = p'ry-{h,^a)+pX'y+{h^^a) +pr^5{h,^a) (41) 

where g^"^* -|- q^^* + Qq^^ = p!r^" +p!^° +Po^° = I7 and ^±{h) are normalized measures with support 
over Ij^. 

So, to every oriented edge we associate a triple of numbers q = (g_, qo,q+) or a triple p = {p-,po,p+) 
(resp. if oriented towards the variable or the clause). In analogy with the self-consistent equations ([7]) for 
fields and biases we can write self-consistent equations for probabilities (surveys) q and p. 




Considering only the combinations with Ai?°^* = 0, the surveys of fields are given by incoming surveys 
of biases as 

pr''+pir''=K-^a n iir'+iri; (42a) 

p^'^+Pir'' = K-^a n iq--^' + <io-^l; (42b) 

Pr^=K2a n «o^^ (42c) 
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where Mi^a is the normahzation factor (in the update, we have three equations for three independent 
unknown, p± and A/"). And the surveys of biases are given by the incoming surveys of fields 

9,17' = K\ p^Z:/-H ; (43a) 

<ai = K\ {p'Ci^ p'c:) + (1 " p'j:;)p'c:) ; (43b) 

= K\ fc>o^" + Pr^P-ll + Pr^t-^") ; (43c) 

where Ma^i is the normahzation factor, and the fower indexes of g's and p's are muhiphed by —1 when 
variable is negated in the clause. 

These equations describe survey propagation , and can be used inside an algorithm to find a solution 
to a typical instance of e-l-in-3 SAT, hopefully also in a region of parameters (e, 7) where short-clause-like 
heuristics or belief-propagation methods fail. They might also be used to compute quantities averaged over 
instances in our random ensemble, particularly the complexity function which determines the SAT/UNSAT 
transition. 

In the zero-temperature limit, the replicated free energy (|35|) becomes 



y$(y) = E 1- (/ e-^^^"^'") - ^(d. - 1) In < 



_ e-y^^ ) , (44) 

i 

where from ([9]) and ((36)) we get the energy shifts 

^^aUda ^ _ {hi^aSi + hj^aSj + hk^aSk " £^{J„}(Si, Sj, Sfe)) + E E l""^^*' ' (^^^ 

^^' = "1 E"--»| + E (46) 

again both these energy shifts are non-negative integers. 

Furthermore in the y 00 limit, we distinguish only if AE = 0, then exp{—yAE) = 1, or if AE > 0, 
then exp{—yAE) = 0. From eq. we get for the complexity at zero energy 

^{E = 0) = ^ In (prob(A£;''^^'' = 0)) - ^(d, - 1) In (prob(A£;* = 0)) , (47) 

a i 

where, calling Jlaee. ^V' ^nd VI Y{aeS^i1V' + iV')^ 

Woh{AE' = 0)^Vl+Vl-V'o; (48a) 

prob(As''^^'' = 0) = j]^ (7',^'' + T'lx - ^0"") - n (^-7r. - 'py) - n (^.^" ~ ^0"") 
-E^-7: n (^.c-^r^)- 

The second equation collects the contributions from all combinations of arriving fields except the "con- 
tradictory" ones (-I-, +, +), (— , — , — ), (+, +, 0) and (-I-, +, — ) (plus permutations of the latters). Note at 
this point that all these equations (|42 |) - ([48|) correctly do not depend on the choice of Hamiltonian ((21) or 

C. IRSB results for the phase diagram 

In this section we give results of IRSB cavity analysis for the e-l-in-3 SAT problem. In the first two 
subsections we concetrate on the positive l-in-3 SAT (e = 0). In the third one IIV C"3l we show results for 
general probability of negation. 

1. Complexity as a function of connectivity 

To compute numerically the average value of complexity from eq. (I47p we first need to find a fixed point 
of the survey propagation equations (|42p and (|43p . We do that using the population-dynamics algorithm 
The result is in fig. d 



(48b) 
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FIG. 4: Average complexity density (logarithm of number of states divided by size of the graph) as a function of 
mean degree 7 for the positive l-in-3 SAT problem. At 7sp = 1.822 a nontrivial solution of survey propagation 
equations appears, with positive complexity. At 7 = 7* = 1.8789 ± 0.0002 the complexity becomes negative: this 
is the SAT/UNSAT transition. At 7p = 1.992 the solution at zero energy ceases to exist. The inset magnifies the 
region where the complexity crosses zero, together with the error bar for the SAT/UNSAT transition. 



Below mean degree 7sp — 1.822 ± 0.001 the survey propagation equations ((42l |43)) have only the trivial 
paramagnetic solution, with po = qq = 1 and q± = p± = for all edges. At 7sp a solution of survey 
propagation equations with positive complexity appears discontinuously. The emergence of this transition 
far below the numerically known SAT/UNSAT threshold suggests that, in a whole interval of parameters 
near to the threshold the phase space restricted to solutions is clustered into many pure states, a Hard- 
SAT phase exists. Furthermore, in that interval, there are also many metastable states, entropically 
relevant, and local minima with positive energy cost separated by macroscopic barriers. This means that 
local algorithms, like decimation heuristics or variants of annealed Monte Carlo, get trapped and are 
unable to find any ground state in polynomial time. Nonetheless, a decimation procedure based on the 
stationary distribution of survey propagation equations is expected to work beyond this threshold. 

Note that 7sp is referred to as a "dynamical threshold" in [§, [3] , we stress that it is not this point which 
is connected to a real dynamical transition Neither is it the point where the local algorithm ceases 
to work in polynomial time. We come back to comment about this point in the discussion, section [V] 

At mean degree 7* = 1.8789 ± 0.0002 the complexity becomes negative. Instead of having a.s. in each 
instance an exponential number of clusters which contain at least one solution, the fraction of instances 
having any cluster which contains at least one solution (i.e. the fraction of SAT instances) becomes 
exponentially small. So this point identifies the SAT/UNSAT transition. 

We are aware of two works where results from numerical simulation for this SAT/UNSAT threshold are 
given. In [H] they conclude the value of threshold is 7* = 1.86±0.03. In 23] they give 7* = 1.875±0.015. 
In fact, the latter do not give an error bar, so we guessed it from their fig. 4. Our result agrees with these 
estimations, and as it is based on an analytical method we reduce the error bar by one order of magnitude 
with a very small numerical effort. 

At mean degree 7^ = 1.992 ± 0.001 the solution at zero energy ceases to exist. In the i/ — > 00 limit the 
population dynamics converges to a solution which shows a finite fraction of surveys of type {p±,Po,p^) ~ 
(0,0,1). Then, with finite probability we would find two such surveys creating a contradiction, the 
normalization in ([^5]) then would be zero. We call this situation a "hard contradiction" . 

Note that such a phenomenon does not occur in K-SAT or Coloring problems. Cavity equations deal 
with the messages incoming to a clause from all neighbours but one. In both if-SAT and Coloring (and 
in many other problems, like NAE-if-SAT, Vertex Covering, and so on), there is no way of making a 
clause unsatisfied if one of the neighbouring variables is not restricted, and indeed a 7^ threshold has 
never appeared in the cavity analysis for these systems. 

In order to obtain a non-singular solution above connectivity 7p, we need to work with the equations at 
finite y, which is able to account for the positive energy contributions. The results of the finite y analysis 
are shown in appendix IC II 
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2. The stability analysis 
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FIG. 5: The stability parameter of the second kind ln(^ii(d)) (|D3P for positive l-in-3 SAT for different connectivities 
as a function of length of the chain d. When the slope is negative, the IRSB at this connectivity is stable against 
bug proliferation, and vice versa. This happens for 7 > 711 = 1.838 ± 0.002. Right: The stability parameter of the 
first kind (|D9|) as a function of connectivity for positive l-in-3 SAT. The stability parameter is smaller than 1 
for 7 < 71 = 1.948 ± 0.002, for these connectivities the IRSB equations are stable against noise propagation. 



In appendix [d1 we introduce two stability parameters /ij (jD9P and /in (ID3|) . Their meaning is analogous 
to that of the replica-symmetric stability parameter A (l?7)) . The IRSB solution is stable if and only if 
both fii < 1 and /in < 1- 

The results for stability parameter of the second kind /in jDSj for positive l-in-3 SAT are shown in 
fig. [5] (left), for finite d. Extrapolation to d ^ cxo is done by linear fit, which looks reasonable from the 
data points. So our criterion is that, if the slope in the logarithmic plot is positive, the limit value /in is 
larger than 1, and vice versa. We estimate that IRSB is "type-II" stable for 7 > 7J1 = 1.838 ± 0.002. 

More directly, for the stability parameter of the first kind fj,i (ID9p . the results are shown in fig. El right. 
So we get that IRSB is "type-I" stable for 7 < 71 = 1.948 ± 0.002 (the generous error estimate is due to 
potential biases caused by finite population sizes). 

The IRSB solution may be correct only if both the stability parameters are smaller than one. For 
positive l-in-3 SAT this happens for connectivities in the range 1.838 ~ 7n < 7 < 71 ~ 1.948. So, in 
particular, the SAT/UNSAT threshold 7* is in the range of stability, and its value is to be considered 
exact. Note that such a situation, in which the SAT/UNSAT threshold falls into a narrow stability region, 
is quite common, and it has been seen also in ivT-SAT jTH and Coloring jT3 |. 



3. IRSB results for general probability of negation 

We applied the techniques of section IIV CI to the problem at finite e. As expected, all the critical 
connectivities describe curves which are continuous at e = 0. We thus show, in figure [6] (left), the curves 
7*(e), 7sp(e) and 7p(e), and, in the magnification on the right of fig. [51 also 71(e) and 7n(e)- 

As e approaches about 0.20, the interesting interval 7sp < 7 < 7p becomes very narrow (fig. [H right), 
and the complexity value very small, S « 10^^, three orders of magnitude smaller than the analogous 
values for e = 0. Above e = 0.20 we do not have sufficient numerical resolution to examine this region at 
all. 

In fig. IHl (right) we plot the four curves 7sp(e), 7p(e), 71(e) and 7n(e), shifted by the curve 7*(e), which 
is used as a reference. This allows us to appreciate that the differences (7* — 7sp)(e) and (7^ — 7*)(e) 
seem to vanish linearly at about e^ = 0.21 ± 0.01, these two linear fits extrapolating to the same value of 
e with reasonable confidence. 

Above Ep, as soon as a nontrivial solution of (energetic finite-y) IRSB cavity equations exists, it has 
immediately a complexity S(i?) of the qualitative shape for the connectivities above 7p, (for example see 
fig. [5] in appendix IC)) . We are thus led to conclude that in this interval the line 7p(e) should be taken as 
the IRSB prediction for the SAT/UNSAT fine 7*(e). At about e = 0.26 ± 0.01 the curve 7p(e) joints the 
unit-clause upper bound. 
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We should add that, above e ~ 0.07, both stabihty criteria fail along the curve 7*(e) (and then, along 
7p(e), for e > ep), so that the IRSB prediction for the SAT/UNSAT transition is not expected to be exact, 
but only an upper bound, in the range 0.07 < e < 0.2726 [35L[36l|. 




FIG. 6: Left: plot of the three curves 7*(e), 7sp(e) and jp{e) described in the text. We left for comparison the SCH 
lower bound (dot-dashed curve) and the RS instability line (dotted curve with stars data points). Right: The same 
data, with connectivity plotted with respect to the SAT/UNSAT threshold prediction 7*(e). Also the stability 
lines 7i(e) and 711(e) are shown, and the interval of stability for the SAT/UNSAT curve is e £ [0, 0.07 ± 0.01]. In 
the inset, all the error bars are approximately as big as the point size. 



V. DISCUSSION AND CONCLUSIONS 



We studied the average-case behaviour of l-in-3 SAT in the random e-l-in-3 SAT ensemble, where 
e G [0,1/2] is the probability of negation. This generalizes the random (symmetric) l-in-3 SAT problem 
(e = 1/2) and random positive l-in-3 SAT problem (e = 0), which is a special ensemble of Exact Cover. 

Our main result is the phase diagram in fig. [1] and, magnified, in fig. [S] above, ft fills the conceptual 
gap between the symmetric problem, of polynomial average-case complexity both in the SAT and UN- 
SAT regions, and the positive problem, which shows a Hard-complexity phase around the SAT/UNSAT 
threshold. 

Concerning the SAT/UNSAT transition curve, 7*(e), we computed upper bounds coming from unit- 
clause technique (UC), and from first moment method with restriction to the 2-core (IMM), and the lower 
bound coming from short clause heuristics (SCH). The UC and SCH bounds have been proven to coincide 
on the interval e £ [0.2726, 1/2], and thus determine the corresponding portion of the SAT/UNSAT line 
rigorously. 

All the other results are obtained with the non-rigorous cavity method. The results of the replica- 
symmetric calculations do not give us a better result for the SAT/UNSAT threshold than the UC and 
SCH bounds, since its prediction have to be rejected above the RS stability line 7rs: fig-El 

It is remarkable that a region of the phase diagram exists, where the replica symmetry is broken, while 
the short clause heuristics is proven to succeed a.s. in polynomial time. For what we know, such a feature 
has not been proven in any of the previously studied models, while it has been often observed empirically 
that some local algorithms, e.g. the Walk-SAT [33, [H, [H, , works in linear time inside a phase with 
replica-symmetry breaking. We are tempted to say that this result actually proves that the onset of 
a non-trivial replica symmetry broken solution does not imply to the onset of computational hardness 
(unfortunately the cavity results are not rigorous and the term "computational hardness" would have to 
be defined properly to be allowed to speak about a proof). However, we hope that this could be used to 
study in a new way the nature of the replica-symmetry-broken phase. On the other hand, and as claimed 
before, we are persuaded that a stable IRSB solution suggest an existence of a Hard-SAT phase near to 
the SAT/UNSAT transition (nearer than the stability threshold). For quantitative study of this point for 
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the coloring problem see [4l|. The analysis of [41j should be repeated for l-in-K SAT problem in future 
works. 

Our main insights for the region with e < 0.2726 comes from the one-step-Replica-Symmetry-Breaking 
(IRSB) calculations by analsis of the energetic zero-temperature limit of the IRSB cavity equations ([55]) . 
in which we keep only the weights of the hard fields instead of whole probability distribution. 

For e < 0.21 we can locate, on the curves for the zero-energy complexity function S(7) fig. HI the 
connectivities 7*(e) at which E vanishes, this corresponding to the SAT/UNSAT transition. The same 
computation shows the existence of a nontrivial solution above 7sp(e) < 7*(e), thus predicting a whole 
interval of Hard-SAT phase with many pure states. However for exact location of the "dynamical transi- 
tion" we would need to keep the information about the soft fields and compute when a nontrivial solution 
of eq. ([55]) appears for the entropically dominating clusters, see [s^]. Note here also that the inequality 
7sp < 7rs for small e is due to the discontinuity of the transition towards nontrivial IRSB phase, that can 
not be seen by the local RS stability analysis. Analysis of 3J] would also show that a nontrivial IRSB 
solution exists everywhere in the region of 7 > 7rs. This analysis for l-'m-K SAT might be a direction 
for future work. 

Above the line 7p(e), fig- [H the system shows a transition to a phase where the IRSB solution at zero 
energy ceases to exists, while it still exists above some value i?min(7, e)- This is due to the presence of 
hard contradictions, a phenomenon specific of strongly constrained problems, like l-in-K SAT, and to our 
knowledge it is a newly observed fact. Interpretation of this transition may be that the SAT formulas 
start to be subexponentially rare at the connectivity jp. 

For e > 0.21 the nontrivial IRSB solution at zero energy never exists, and we can trace only the curve 
7p. This would be the IRSB prediction for the SAT/UNSAT transition. This result suggests that for 
e > 0.21 a Hard-SAT region might actually be absent. Specifying what sort of replica-symmetry-broken 
solution is connected with the breakdown of local algorithms, like decimation heuristics or variants of 
annealed Monte Carlo, is an important direction of future research. 

We have checked the local stability of IRSB solution towards 2RSB. The result is that IRSB is stable 
only in a small region between the lines 71 and 711 in fig. [SI This means, among other things, that the 
IRSB location of the SAT/UNSAT transition for e < 0.07 is likely to be exact. In particular, this is true 
for the positive l-in-3 SAT threshold, 7* = 1.8789. For 0.07 < e < 0.2726 the IRSB result is unstable so 
the exact location of the SAT/UNSAT transition in that region remains an open question. We can only 
conjecture that our IRSB result is an upper bound, in analogy with proofs for other models in [ssl. [36j. 

Furthermore, it would be interesting to compare our results with the behaviour of the structurally affine 
(2 -I- p)-SAT problem [l^, [l3| for which the IRSB analysis is still missing. 

Finally, as we mentioned in several places above, the IRSB cavity approach allows for algorithmic 
implementations. We started studying this aspect also together with Elitza Maneva and Talya Meltzer, 
and the results will be published elsewhere. 
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APPENDIX A: UPPER AND LOWER BOUNDS FROM UNIT-CLAUSE PROPAGATION 

AND DECIMATION HEURISTICS 

Consider an instance drawn from an appropriate ensemble, and subject to a decimation algorithm 
wherby in each time set a variable is set to ±1. Call X the discrete decimation time (number of variables 
set, among the N total) and Ci{X) the number of clauses remaining of length i > 2. Thus the initial 
conditions for the instance are X = and Ci{X) — N^Si^^. Assume for now these quantities are sufficient 
to describe the instance in the absence of clauses of "length 1" {unit clauses). If we assume a variable 
is fixed (decimated) from such an instance, the remaining instance involves a smaller number of literals. 
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so it is simplified in some respect. More importantly, some of the clauses are shortened, and may even 
be reduced to unit clauses. The unit clauses, being only 1 literal, allows no ambiguity in the values their 
variables must take in order for the instance to be SAT. The initial fixing of one variable by this process, 
forces the value of some other variables, which may again propagate, i.e. a branching process. So, a single 
binary choice could decrease the number of variables by a considerable amount, as a result of a cascade 
of these unit-clause implications. 

The justification in considering the instance at all times described by {Ci{X)} and X is the follow- 
ing. For sufficiently simple decimation rules, the distribution of remaining variables within clauses will 
be uniform and random at all X, and if N is sufficiently large the values of Ci{X) are self- averaging. 
Furthermore, at fixed clause length, the fraction of clauses with a given number of negations is the one 
expected from an independent Bernoulli process: among clauses of length «, at all times there is a frac- 
tion (^)e''(l — ey~'^ of clauses with h negations. All these elements are necessary to allow a sufficiently 
concise dynamical description to make the progress in the following sections. Among the various possible 
heuristics - which determine the values set in the absence of unit clauses, one typically is interested in the 
(suboptimal) subset of heuristics which preserve these decorrelation properties of the Poissonian ensemble, 
so that a statistical analysis is achievable. 

It is useful to consider the algorithm as partitioned into rounds, which consist of a single application 
of the heuristic rule (free step), followed by the cascade of unit-clause propagations (forced steps). In 
expectation, the number of variables fixed throughout a round of unit-clause propagation is described by 
a transition matrix depending on the clause distribution {Cj(X)}, which are constant to leading order 
during any "subcritical" round (defined below). The unit clauses generated in the first free step go on 
to generate other unit clauses and so forth, this can be described by a geometric series in the transition 
matrix M{X). Calling p = (j)t,Pf) and m = (rnT,mF) respectively the expectations for the numbers of 
variables fixed to (True, False) at the first level of the cascade (p), and on the whole cascade (m), we have 

m=p + M{x)p + M'^{x)p H = (/ - M{x)y^p . (Al) 

The matrix inverse above is justified from the fact that, for consistency of the approximations, we require 
the round to be subcritical, thus we must consider the range of parameters where the modulus of the 
largest eigenvalue of M. is smaller than 1. This is also responsible for the approximation that M{x) 
remains invariant (up to order j^) during the cascade. 

The transition matrix has two components. A first one comes from A'-clauses which are "broken" into a 
bunch of A' — 1 unit-clauses, because the fixed variable already satisfies the original one, and all the other 
K — \ variables thus should take the value not satisfying the clause. A second one comes from 2-clauses 
(if any) which were just "shortened" because although the fixed variables are not satisfying them, the 
left over variables must, by definition creating unit clauses. Since there are Ci{x) clauses of length i and 
N — X variables left in the instance, these two terms are combined in the expression 



Mix) 



N ~X 



e(l-e) 



(C2(X) + 3C3(X))(^y l'^ ,]+C2{X) 




(A2) 



Here we identify that the unit-clause cascades on a large graph are a simple uncorrelated process, governed 
by the spectrum of a certain finite-size "transition matrix" (in our case, 2x2). If all the eigenvalues have 
|Aj| < 1, the process is subcritical: the typical size of the cascades is 1/ mini(l — |Ai|), and their average 
size concentrates. Conversely, when the gap 1 — |Ai| vanishes, a single cascade could visit a finite fraction 
of a graph even in the large N limit, and thus could lead to a contradiction. 

Also note that both the upper- and lower-bound arguments later derived are complemented by an 
analysis of the concentration properties of the process [2l|, and by a non-rigorous argument on the 
approximate decorrelation of distinct random restarts on a fixed instance. The vulgate version reads: "If 
a random algorithm succeeds with finite probability p on its first run, after ~ n independent runs the 
probability of success will be 1 — exp(— nln(l — p) + ■■■)" , where the dots stand for some function of 
the correlations, small in n/N, caused by working with a fixed finite instance. We do not discuss here 
these complex technical points. 



1. Upper bound 



If, for a variable i selected randomly from the initial instance, both the cascades initiated by s{i) = +1 
and s(i) = —1 percolate, there is a finite probability that they result in a certificate of contradiction. 
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Thus the upper bound for the SAT/UNSAT transition comes from the requirement that the cascades 
are on the edge of criticahty aheady at time X — 0. At this point we have C2 — and C3 = so 
that we get 



2 



MiO) = 2ji^^^_J^ (Ai,A2) = (47e(l-e),0). (A3) 

From this we see that a random instance is a.s. (randomized hnear time) provable to be unsatisfiable for 
7 larger than the percolation threshold 

7ue(.) = ^. (A4) 



2. Lower bound 

The differential equations studied here are a generalization of those found by Kalapala and Moore [l^l 
for the case of positive 1-in-if-SAT. 

The heuristic determines the nature of the free step in our rounds. The two rules examined here in 
are random heuristic (RH[p]) and short clause heuristic (SCH). In RH[p] a variable is chosen at random, 
with uniform probability, from the remaining unassigned variables and set to true with probability p. In 
SCH, a random 2-clause is selected (if any exists) and a random literal set to satisfy the clause (hence the 
other literal is set to not satisfy it). If at some time no 2-clauses exist, a RH choice is performed, but fact 
will not be relevant in our statistical analysis: criticahty of the cascade process will always arise after a 
time interval throughout which an extensive number of short-length clauses have been present (except at 
X^O{l)). 

If at some time X, px variables are set to True and pp to False in expectation, then Ci changes 
accordingly. If an i-clause contains the variable just fixed, it is reduced to an (i — l)-clause, and similarly 
an (i + l)-clause can be reduced to an i-clause. Call e = (e, 1 — e), p = {pt,pf), and 1 = (1, 1). Still in 
expectation: 

a{x + i-p) = {Q{x) - as.Ai - Sc.(x),,)) (1 - ]^r^(i ■p))+ ^^(^ ■ p) ; (^5) 

where a = or 1 respectively in the case of RII[p] and SCH. The two heuristics are distinguished in that 
to initiate the cascades for RH[p] we have Prh[p] — (Pi 1 ^ P); and for SCH we have PgCH = (l; !)• The 
SCH value can be understood since setting one random literal in a two clause implies setting the other to 
the opposite value, thus setting variables to either ±1 is equally likely in expectation. 

A round can be described by incorporating the variables set in the forced steps. Suppose that during a 
subcritical round tjit variables are set to True and mp to False in expectation (including the free step), 
and call m the vector (my, mp)- To leading order in N — X the variation is 

C,{X + 1 . m) = {C,{X) - a*,2(l - ^c.(X),o)) (l " J^^^ ' ) + Jf^^^ ' ' (^6) 

A final simplification in the clause dynamics is to summarize the behavior by continuous variables x = X/N 
and Ci — Ci/N. In the hypothesis of subcriticality, m/N is infinitesimal, and we attain a differential 
equation description 



Ci{x) = -a6^,20ic2{x)) ( ) + (-ici{x) + {i + I) ( o,+i{x)) , (A7) 

\l-m/ 1 — X \ \l-m/ / 

For both RH[p] and SCH rules, the equation for C3(x) gives 



_d_ 

dx * 



C3{x) ^ ^{1 - x)\ (A8) 



Instead, for 02(2;) the equation is nonlinear. Indeed we get that (tot) and (mp) are given by the combi- 
nation of equations (|A1|, IA2| IA8|1 . and thus depend on the unknown function C2{x) (besides, of course, x. 
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FIG. 7: On the left, profiles of X{x) along decimation time x, for RH[1], at various e and at the corresponding critical 
value of 7. In all the cases, the functions X{x) are concave (up to the limit value e = 1/2, where X{x) = 1 — x). 
For e larger or smaller than the tricritical value e* = 0.272633, the maximum of A(a::) is achieved respectively at 
2; = or at a; > 0. On the right, critical curves 7sch(e) and 7RH[i](e), obtained through short-clause (SCH) and 
random heuristic at optimal parameter p — 1 (RH[1]). For a comparison with other values and curves here out of 
range, cfr. figures [1] and|3l 




0.272633 

e and 7). Using this expression within (|A7|) allows us to determine C2{x) by numerical integration, and 
thence Amax(a;)- 

The best choice for the parameter p in RH[p] is the one which creates the smallest cascades, i.e. the 
one "more orthogonal" to the principal eigenvalue e = (e, 1 — e), but compatible with the probabilistic 
interpretation of p. Thus, in the whole interval e G [0, 1/2], the choice p = 1 is optimal. 

Here we thus show the results for RH[1] and for SCH. The latter is always at least as good as the 
former, and gives a lower bound of 7sch — 1-6393, while RH[1] attains 7rh — 1.6031, for the case e = 0. 
Kalapala and Moore calculated these quantities for positive l-'m-K SAT, with compatible results for the 
K ~3 case (up to maybe a misprint exchanging RH[p] with RH[1 — p]). 



3. Exact SAT/UNSAT threshold for e > 0.2726 

This section proves the coincidence of the curves 7sch(e) and 7uc(e) for e > 0.2726. It was shown in 
the previous section that whenever the cascades remain subcritical we are in the Easy-SAT phase. The 
criteria for the cascades to be subcritical at a; = is precisely 7 < 7uc(e)- It thus suffices to show that 
the maximum (over the decimation time x) of the max^ |Ai(a;)|, is attained for x = 0. This is indeed what 
happens in the interval e e [0.2726, 1/2]. 

Building on the previous section we will see that, for e-l-in-3-SAT and our heuristics, A(x) is a concave 
function. So, the interval on which 7uc(e) and 7sch(e) coincide is the one in which 



dA(a;;e,7 = 7uc(e)) 



dx 



< , (A9) 

x=0 



the endpoint being determined by the corresponding equality. 

It is possible to calculate the characteristic polynomial (and differentiate w.r.t. x). The expressions 
thereby found can, however, only be evaluated exactly at x = 0. At this value we have expressions for 
Ci{x) and there derivatives in terms of the initial conditions and m. 

As we increase connectivity towards the critical limit, 47e(l — e) = 1, a further simplification is in the 
eigenvectors of Ai which become e and its orthogonal, the latter having null eigenvalue. Regardless of p 
(which must have a component along e in order to have positive components), we have 

J^{x,e):=^; ^(0, e) = 111 ^ 1 - 2e(l - e) . (AID) 

1 • m 1 • e 

Finally, the condition (|A9[) becomes 

1 ,\2\ 

-2e(l-e)) -2 < 0. (All) 
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So that finally, after the change of variable x = 2e(l — e), one gets the equation for the endpoint of the 
interval 

2a;3 _ 2a;2 +3a; _ 1 ^ 0, (A12) 

whose only real solution gives e — 0.272633, or its symmetric point. 

To show that the properties at a; = are sufficient to determine 7sch it is necessary to show that 
whenever criteria (jAlip is met, and A(0) < 1, the algorithm is subcritical at all x. If we want a true 
analytic proof, besides the numerical evidence of figure [7] (left), a method is to find a function \{x) such 
that 

A(a;) < A(2;) < A(0) , (A13) 

hence establishing the result. 

Since we find that \{x) is a monotonically increasing function of 02(1), an upper bound C2{x) > C2{x) 
implies an upper bound in A(a;) also which we take to be A(a::). The bound function 62 is defined by 
replacing the complicated function J^{x, e) by the constant value J^{0, e) in the expression (|A7p . which are 
then exactly solvable for all x as 

C2(x) = -fx{l-xfT{0,e) = "fxil-xf{l - 2e(l - e)) . (AM) 

For RH[i] and certain other heuristics this approximation can be shown to produce an upper bound for 
C2{x), and yet be exact at x = in both absolute value and derivative. 

This then allows an exact expression for ^^^j^ to be written in terms of x. Though the dependency on 
X remains complicated it can be established that 

< and A(0) < 1 , (A15) 

dx 

exactly in the same interval of e in which (jAlip holds. These fact proves that our "local" analysis at 
X = was sufficient for the purpose of identifying the maximum over x of A(x) in this interval. 

As a final remark we note that the proof of this exact bound is indirectly reliant on the convexity of 
the curves for all e (figure [7] left). Interestingly we found that for e-l-in-3-SAT with fc > 3 the curves are 
not convex for some e; the gradient at x = may be negative and yet the global maximum in lambda 
appears at x > 0. On first inspection a rigorous bound appears more challenging to obtain in these cases. 



APPENDIX B: OTHER BOUNDS 



1. Upper bound from first moment method 



Here we obtain the statistical properties of the 2-core of random bipartite graphs, in the Erdos-Renyi 
ensemble described in section IL^ with K — 3. Assuming 7 > 1/2, the percolation transition, we solve 
self-consistently for the probability that a given branch of the graph is not percolating. 

We use "giant" or "small" for synonymous of "of size of order N" or "of size of order 1" respectively. 
Indeed, for graphs in our ensemble, a.s. there is a single giant 2-core component. Each edge is either 
attached on both sides to a small tree; is attached to a small tree on one of the two extrema, and to the 
giant 2-core on the other one (i.e. it is in the leaf-part of the giant component); or is connected to the 
2-core through both extrema. Only in this last case is it in the 2-core of the graph. 

Consider an incoming edge from a clause on the original graph. Call q the probability that the part 
of the graph "downstream" is a tree. The edge will be attached to a variable, participating in k other 
clauses (k Poissonian distributed of rate 7). For each clause there will be two incoming edges, which must 
also be connected to finite trees. Self-consistency will thus require that 

k 

The two functions (7(7) and 7(9), inverse of each other, are both monotonic on our domains q G [0, 1], 
7 G [1/2, +00). In particular, 7(g) has an algebraic form 



(B2) 
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so that using g as a parameter instead of 7 will simplify our equations. 

The probability that an incoming branch from a variable is connected to a tree is , as it is the 
probability that both outgoing branches from the neighbouring clause are connected to a tree. Thus, 
the average number of variables of coordination fc > 2, {Nk), is proportional to a Poissonian with rate 
7(1 — q^) = — h\q. Then, there will be (M3) = (1 — q^M clauses remaining of degree 3, and {M2) — 
3(j(l — q^M clauses reducing to 2-sat clauses, all the others are decimated by the leaf removal. The 
number of edges \s E = kNk = 3A/3 + 2M2, so on average {E) = N{— lng)(l — q). All these averaged 
quantities are concentrated. 

Consider the ensemble of configurations s, in "spin" notations as in the rest of the paper. Call the 
fraction of variables of degree k which take value +1: the space of configurations is thus described by the 
infinite vector {xk}k>2, with each x in [0, 1], and a vector {xfc} comes with an entropy 

5var(a;) ^^iVfcMa^fe), (B3) 

k 

where we use the common two-state entropy function h(x) — —a; In a; — (1 — a;) ln(l — x). Denote by x the 
fraction of incoming edges from variables assigned value +1 

k 

and by p the fraction of edges {ai) such that JaiSi = +1, and hence p = (1 — e)x + e(l — a;) ~ e + (1 — 2e)a;. 

The probability that a l-in-3-clause is satisfied is thus 3p(l — p)^, and the probability that a 2-sat clause 
is satisfied is 1 — . So we get for the entropy term coming from clauses 

ScMx)) = M3 ln(3p(l - pf) + M2 ln(l - /) . (B5) 

The upper bound on the SAT/UNSAT transition is achieved by the line, in the {q,e) plane (with range 
[0, 1] X [0, 1/2]), where the total (intensive) entropy S{q, e) vanishes: 

5((j, e) = max5((7, e;f) ; 5((j, e; f) = 5var(:r) + 5'cia(p(:E)) ■ (B6) 

This variational problem is infinite dimensional, thus at first sight intractable. Instead, stationarity with 
respect to Xk produces 



1 In ^ = (1 - 2.) f - ^ ^-^]=: y{p) . (B7) 

k 1-Xk ^ ' \3{1 + q) p{l - p) 1 + ql-p^' ^ ^ 



Remarkably, a single parameter y describes the family of possibly stationary vectors {xfc}, this being a 
residue of the original independence of the Poisson ensemble. 

Then, we can get self-consistently p from y, through the XkS 

p(j;) = e+(l_2e)^y fciVfe = e+(l-2e)-^y ^^^^ • (B9) 

k k 

So, the expression for the entropy e) is given by the function 
,V-(-lng)\/ 1 \ , (-ln<7)(l-9), 



^('?'^) -122^^'^ [iT^) + 3(l + g) - l-(3p(l -p)^) + 3qHl-p')) , (BIO) 

where the values of p and y are determined by the (only) solution of the nonlinear system of two equations 
(IB7p and (jB9p . The set of points {e,j{q)) where the function S{q,e) vanishes describes a curve which 
appears in the figures [T] and [3] 

Finally, we remark that a better upper bound can be achieved if one realizes that a further removal 
procedure is allowed: if a variable is connected only to 2-sat clauses, and with edges all of the same sign, 
then one can safely fix it, satisfying all the neighboring clauses. The new reduced instance is SAT if and 
only if the original one is, but the number of solutions is potentially smaller: as this decreases fluctuations, 
the bound is improved. 

For the case e = 0, this program is achieved in [23i] , although in the restriction to the variational space 
of Xk all equal, and leads in that case to the (better) bound jc < 1.932. 
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2. Algorithmic upper bound through embedding into 3-XOR-SAT 

An instance of l-in-3-SAT is SAT only if the corresponding 3-XOR-SAT instance is SAT, where 3-XOR 
clauses allows also for the extra "spurious" configuration (JiUi, J2<J2i JsO's) = (+i +, 

Random Erdos-Renyi graphs with K = 3 have a finite core (under hypergraph leaf removal: if a variable 
has degree 1, one removes it together with the incident clause) beyond a "dynamical" threshold ad — 0.8f 8. 
In a range < a < ac = 0.918 there is an exponential number of solutions in the XOR-SAT problem, 
even if restricted to the core. However, beyond the critical value ac there are no longer solutions (other 
than the single trivial one, with all ai = +1, in the case of "positive" instances with all Jai = +1). These 
results can be found, for example, in 0, [2^|. 

The last situation can be detected in polynomial time, by Gaussian elimination on the adjacency matrix: 
if the rank equals the number of variables (more generally, if it is smaller by at most ©(In N)), the solutions 
on the core can be checked in polynomial time. As a.s. all the variables are forced to be -1-1, a fraction 
of order 1 of the clauses in the core will be proven to be satisfiable only by the "spurious" configuration 
(-f , -|-, -|-). This provides a certificate of unsatisfiability for the original instance in the random e-l-in-3- 
SAT ensemble. 

So, at all e, for 7 > 3q!c — 2.754, one gets a.s. a certificate of unsatisfiability in randomized cubic time 
(an upper bound for matrix triangulation) . This proves that the Easy-UNSAT phase starts from a finite 
7 at all e. 

The method strongly relies on the fact that a XOR-SAT core exists in the instance. Unfortunately, 
this is not the case for the customary reductions of SAT to e-l-in-X-SAT, even at large a (as the former 
constraints are much more sloppy than the latter, the reduction makes use of auxiliary leaf structures), so 
the method does not extend to a randomized polynomial-time algorithm for finding a certificate in lar ge-a 
3-SAT instances, in agreement with the widespread conjecture that such an algorithm can not exist [26[. 



APPENDIX C: IRSB AT POSITIVE ENERGY (FINITE y) 

In this appendix we describe the IRSB solution in the energetic zero-temperature limit, but at finite 
value of parameter y. That allows us to compute the dependence of complexity E on energy E for a given 
probability of negation e and connectivity 7. 

The survey propagation equations (|42|) . ((43|) at finite y becomes 

pr"+Pr"=-AA,i:„(^e-^'-prob(A^*^"=r, ^ u,^, > 0)); (Cla) 



■Pr"=A/',^'a(E^""'P^°b(A^^^"=r, E ub^.<0)); (Clb) 
Pr"=-A/;i:a(Ee""'P^ob(A^^^"=r, E "^^^ = 0)); (Clc) 



b^di^a 



where M^o is the normahzation factor, and AE"^" — X^beOi^a Wb->i\ — \^bGdi-^a^b^i\- When using 
Hamiltonian Tl as in ^ (as we did) , the probabilities of biases are given by the incoming probabilities of 
fields as 

ir'-K-i^ [fe>o^'' +pr'p-ii +pr''po-^'') +i^j:;pj7:^-'] ; (C2c) 

where Ma^i is again the normalization. If we were using Ti.' of equation p3p instead, the last summand 
in (jC2c|) . proportional to e^^, would have appeared in (jC2bp . 
Similarly, for the replicated free energy and energy we have 

-y^{y) = E 1^ ( E "'^'^ Pi-ob(A£;'^^^" = r)) -"^{d, - l)\n(^Y^ e"^'' prob(Ai;* = r)) ; (C3) 



p. ^ V- ^ e-^-^ prob( Aii;'-^'-^'- ^ r) ^ . . ^ e-^- prob( Ai^' ^ r) 
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7 


2/ex 


ymax 


-£'max 


^max 


J/gs 




ymin 


-£■111111 


^miii 


1.850 


2.9 


3.35 


0.000410 


0.00414 











(+0.00247) 


1.879 


2.1 


2.83 


0.000969 


0.00441 











(-0.00015) 


1.900 


1.6 


2.64 


0.00143 


0.00471 


5.31 


0.000282 






(-0.00215) 


1.935 


1.0 


2.40 


0.00236 


0.00480 


4.31 


0.000939 






(-0.00570) 


1.970 


0.6 


2.25 


0.00336 


0.00517 


3.83 


0.00176 






(-0.00954) 


2.000 


0.4 


2.09 


0.00428 


0.00534 


3.60 


0.00255 


7.37 


0.000149 


-0.0111 


2.050 


0.1 


2.04 


0.00573 


0.00611 


3.27 


0.00406 


6.37 


0.00107 


-0.0121 


2.100 


0.1 


1.94 


0.00613 


0.00815 


3.04 


0.00580 


5.76 


0.00232 


-0.0130 


2.150 


0.1 


1.86 


0.00766 


0.0104 


2.88 


0.00632 


5.27 


0.00377 


-0.0138 


2.200 


0.1 


1.67 


0.00993 


0.0126 


2.72 


0.00674 


5.32 


0.00549 


-0.0155 



TABLE I: Values at special points in the complexity vs. energy curves of figure[8l The parametrizations E{y) and 
E(2/) at various gamma exist for y > yox(7) (the estimate is an upper bound, being the first value for which a 
non-paramagnetic RS solution emerged). All energies E and complexities E are extensive, and factors 1/A'^ in the 
table entries are understood. The triplets {y, E, E)inax and (if any) (y, E, E)inin are the two turning points, while 
{y, EjT, — 0)gs is the point where the complexity vanishes, along the physical branch of the curve. In the column 
for Emin, the first 5 values are instead E(_E = 0), as -Bmin = without turning point in that case. 



The probabilities prob(A£' = r) have to be computed algorithmically (sum over all combinations). We 
found a closed formulas only when r = (no contradictions), equations and 



1. Complexity as a function of energy 

Again with the population-dynamics algorithm we solve eqs. (jCll) - (jC4p and from the solution obtain 
the replicated potential <&(?/) (|C3p . The function T,{E), plotted in fig. [HJ is obtained by fitting function 
$(?/) and computing the Legendre transform (|57|) of the fit: 

A non-paramagnetic solution of (jCl|C2|) exists only above a value yox(7), which decreases monotonically, 
from its asymptotics 2/ox ^ +oo for 7 \ 7sp to 2/ox ~* for 7 — > 00. The function S(i?) is parametrically 
identified from the two S(2/) and E(y) above. Assuming the latter are regular functions and noting that 
dJ^{E) I dE = y, one also finds that T,{E) is regular, and convex or concave respecively if parameter y grows 
towards right or left, up to (possibly) special points where the curve changes concavity (turning points). 
Note that only the concave parts have a physical meaning, while the convex parts are the "non-physical" 
portion of the formal solution. 

It turns out that, for 7sp < 7 < 7p, there is a single turning point, from convex at small y to concave 
at large y, the values -Emax and j/max labeling the values of E and y at this point. Instead, for 7 > 7^, a 
second turning point, from concave to convex, arises at higher values of y, thus defining the values -Emin 
and j/min- 

The labels of "max" and "min" stand for the fact that there exist phases with energy E (a number 
approximatively exp(S(i?)), if S(£') > 0, or in a fraction of instances of order exp(E(£')), if S(-E) < 0), 
only for energies in a range i?min < E < Emax (where Emin = for 7 < 7^), while we should interpret 
that there are no pure phases with energy E out of the range above, up to a subexponential fraction of 
instances. 

We find that, for all 7, S](Emax) > 0, while for 7 > 7p, S](Emin) < 0. An intermediate value i?min < 
Egs < E-[a_ay., Corresponding to the one at which the complexity vanishes, exists for all 7 > 7*. It is 
the value of minimum energy of a configuration in a typical instance sampled from the corresponding 
ensemble, so it is important for the statistical properties of the "optimization" problem in the UNSAT 
phase. 

In the table U we show the numerical values of the quantities described above, for a range of relevant 
7's, obtained by population dynamics. 
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FIG. 8: Complexity as a function of energy for positive l-in-3 SAT, for several different connectivities. 7 — 1.850 
is in tlie SAT region, 7 = 1.879 is near tfie SAT/UNSAT transition, 1.900 < 7 < 1.970 is in the UNSAT region 
and 2.000 < 7 < 2.200 is in the UNSAT region where a solution at zero energy does not exists. 
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0.014 






0.012 




EUl) 


0.010 




£min(7) 


0.008 






0.006 






0.004 
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p = 1.822 

7* 


1.9 
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7p=1.992 



FIG. 9; The values of i5max(7), Egs{j) and i5min(7), and a quadratic fit which extrapolates to i5max(7sp) 

Egsij*) = Eminijp) = 0. 



Let us mention that, in our interpretation, the possibility of hard contradictions, peculiar to l-in-3 SAT 
and other highly constrained NP-complete problems and absent in the more intensively studied if-SAT 
and Coloring, is responsible also for the existence of the second turning point, and the second unphysical 
branch at high values of y, which is indeed a new feature of this system. 



APPENDIX D: STABILITY OF IRSB 



In this appendix we describe how to check the self-consistency (stability) of the IRSB solution, with 
a treatment similar to the one in section IIIIEl for the replica-symmetric solution. We do it only for 
the solution at zero energy, y ^ 00: as we will see, this is sufficient to determine the SAT/UNSAT 
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transition line in an interval of e near e = 0, thus complementing the informations we already have for 
the neighbourhood of e = 1/2. 

The stability analysis of the replica-symmetric solution investigates if the replica-symmetric state tends 
to split into exponentially many states. In the case of IRSB we have two stability conditions to test 
this. The type I stability condition determines the tendency of IRSB states to aggregate, and the type 
II determines the tendency of the states to split. The names type I and type II comes form jTll. [29l|. In 
the case that the IRSB solution is not stable, i.e. the states tend to split or aggregate, we would deduce 
instability towards the 2-step of Replica-Symmetry-Breaking (2RSB). However, this is not expected to be 
the correct picture, while what more probably occurs is what is called fuU-RSB [3l[, or something else 
still unknown. 

The method for cornputing these stabilities for models on random graphs was introduced in [2^ and 
applied to i^T-SAT [HI, [13] and later to many other problems. For both types of IRSB stability there exist 
several equivalent analyses. 

For the stability of type II we choose the bug proliferation. This is developed concisely in section ID II 
for the IRSB solution of zero energy (y oo). For more theoretical background for this method see 
papers ^2^, i3Q, i32] . 

For type I instability it is possible to consider the convergence of survey propagation equations (|42l |43| 
on a single graph. However, we choose to consider the noise propagation method, which uses a population- 
dynamics technique similar to that used for stability of the replica-symmetric solution, eg. (j3ip . Again 
we give just the formulas and results in section [D2\ for general explanation see [29l fsol [s^. 



1. Stability of the second kind, bug proliferation 



Suppose that, in a neighbourhood of edge (a,i), with j and k being the other two variables incident 
on a, an incoming warning Ub—^j is changed from value u to another value u' . Assume also that there are 
n — l other clauses C2, . . . , c„ (besides a and b) incoming to node j, the warnings having values respectively 
M2, . . . , u„, and that there are m other clauses c'j^, . . . , cj„ (besides a) incoming to node fc, the warnings 
having values respectively vi . . .Vm- Conditioned to the existence of the path {{bj), {ja), {ai)), the other 
coordinations m and n — l are decorrelated and Poissonian distributed, with rate 7. Denote with the 
letter J' the set of parameters describing the characteristics of the graph in this neighbourhood 

J = in-l,m;Jai,Jaj,Jak); (Dl) 
Labels w and w' will describe the value of the output warning, Ua^i, under Ub-tj — u and u' respectively: 




Define the six-dimensional transition matrix, in the index pairs {u,u') and {w,w'), pairs of distinct 
elements in {0, ±1}^ 

vj{w^w'\u^u')^^ j2 <z?r' • • • lur' ^s'r" • • • ^^i::"' 

u2---u„e{o,±i} 

The quantity Viw w'\u ^ u') is proportional to the probability that the change it ^ it' in warning Ub^j 
has induced a change w ^ w' in the warning Ua^i- Here JF"^* (u''^-' , u'^^^^ . . . , u^"^-' , u^i^''' . . . , u^™^*^) 
implements the cavity equations at zero energy Ijl2p . with the appropriate disorder parameters 
{Jai, Jaj, Jak), SO the delta's force this value to be equal to in the two cases (u^~*^ = = w) 
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and {u^^^ = m'jM"^* = w'). The delta in the energy shift is the residual of the reweightening factor 
exp(— j/Ai?"^') in the limit y — > oo (note that it only appears on the pair {u',w') of biases along the 
chain). Normalizations M = Afj^a-^k^aJ^a^i are those from eqs. (|42| |43| . 

The transition matrix defined above determines if a small fluctuation in the equilibrium distribution of 
the fields is reinforced through the cavity iterations, thus leading to an instability. After d iterations, the 
modulus of the fluctuation changes on average by some factor {2^)'^Ti{'Pj-^ . . - Vja), because there are on 
average (27)'' chains of length d ending on a given edge, and the trace of a "chain" of transition matrices 
estimates the influence of changing a bias at an edge at distance d upstream. So we define the (finite-(i 
and d — > 00) type-II stability parameters 



Hu = hm /iii(d) ; 



(D3) 



where the average is over the connectivity distribution and the disorder in negations, i.e. the parameters 
globally identified above with the letter The various J7's refers to the different segments of the chain, 
and thus are independent. Again, the stability condition reads 



Wi < 1 ■ 



(D4) 



The matrix V for the e-l-in-3 SAT problem is six-dimensional, and we computed it for general realizations 
of negation and connectivities. It has a block-triangular form (a change — > ±1 never induces changes 
other than ^ ±1 and a change ±1^0 never induces ±1 =f1). Moreover two of the three 2x2 
blocks on the diagonal, B, are equal and have elements always larger than those of the third block B' 



'P{w — > w'\u — > u') : 





0^±1 


±1^0 






B 








±1^0 




B 











B' 



So in the large-d limit we need to analyze only the 2x2 block Bj oi elements, propagating the 'bug' 
^ ±1. For = (n — 1, m; Jai, Jaj, Jak), the block elements take the form 



B 




" = n ^0 n 9o : 
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J=2 



^=1 



(D5) 
(D6) 



and triplets {qq, q'L,q'\.) for different edges are independently sampled from the stationary distribution. 
Results for the type-II stability are presented in figure [5l 



2. Stability of the first kind, the noise propagation 



In a similar way to bug proliferation, we write a sort of transfer matrix T 

(D7) 

The dependence of on q^^^ is given by the survey propagation equations (|42l43p . 

We perform a population-dynamics analysis, where to every edge in the population is associated a triple 
for the surveys, (g-, go, updated with the cavity equations (|42l43p . and a pair of noise parameters, 
V — (w-f , w~ ) , which are updated according to 

^^.^ ^ Tf}^;^-'+ T^^lv'-K (D8) 

The motivation is to compute whether a small change in the equilibrated incoming survey q^^^ is dumped 
under cavity iterations. 
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The analysis goes in complete analogy with the one in section IIIIEI We initialize the noise parameters 
with an arbitrary random procedure, and wait for equilibration of the distribution, up to a scaling overall, 
ll"*^!!* ■= X^e (l^+l + l^-l) where t denotes the iteration time, and the sum is over the population. The 
stability parameter is now, for some time t larger than equilibration, 

- (w ) ' 

and the stability condition is /ij < 1. 
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